# require = will install and load, library can only load installed
# packages
require(tidyverse)
require(ggplot2) # pretty plots
require(dplyr) # dataframe manipulations and %>%
require(infer) # statistics
require(tidyr) # nice tables that look nice
require(car) # ANOVA, also used by the authors for ANOVA
require(mosaic) # histogram, instead of hist (contains {lattice})
require(formatR) #to format R
require(knitr) # inserting images
require(kableExtra) # modifying kable tables
require(skimr) # for data overview, data visualization
require(data.table) #
[Include a brief summary of the paper you are reanalyzing data from (e.g., the overall objective of the paper, the types of data collected and how sampling was done, what the main results were) and lay out what you intend to replicate.]
The southern Darwin’s frog (Rhinoderma darwinii) mouth-brooding frog species endemic to Chilean Patagonia. R. darwinii has a fascinating method for reproduction: the female (egg-producer) deposits eggs and the male (sperm-producer) fertilizes them, then guards them until the embryos are visibly wriggling inside. Then the male transfers those eggs to their vocal sac where they will be safe from predators and receive nutrients from the male (similar to marsupial embryonic development in their pouch). Ultimately, the male broods these fertilized eggs, often from multiple clutches with different partners, in their vocal sacs for about 50 days, until fully developed froglets emerge by pushing their way out the male’s mouth! All a while during this gestation period, the male continues to call for mates (Sandmeier, 2016)! There are 3 sex roles in this species: pregnant male (MP), non-pregnant male (M or NPM), and female (H or F), and the main sexually dimorphic characteristic is size, where the egg-producers (F) are slightly larger. In this nearly sexually monomorphic (i.e., the sexes look physically identical) species, mate selection depends on advertising calls, and R. darwinii is one of the few species where all sexes (PM, NPM, and F) use advertising calls to attract the attention of a potential mate (Serrano et al., 2020).
To investigate the factors modulating monomorphic vs. dimorphic sexual signals in the R. darwinii, Serrano et al. (2020) hypothesized species with temporally variable periods of intrasexual competition will have monomorphic instead of dimorphic sexual signals to ensure conspecific recognition. In R. darwinii, they specifically focus on timing/availability of advertising calls for mate attraction used by calling (pregnant and non-pregnant) males and calling females (i.e., vocal phenology) (Serrano et al., 2020).
They investigated this with a population of southern R. darwinii on Chiolé Island in Southern Chile during mating season (October 2015-February 2016). Of the 156 frogs caught in the field (118 adults), they recorded calls from 32 of them (Plot1) In the field, they first recorded individual calls (tracks) then collected population monitoring data (snout-vent length, SVL (mm); weight (g)). They also collected data on the sex and sexual status (MP = pregnant male, H = female, M = non-pregnant male) of each frog caught using body size and morphological characteristics. For the pregnant males, they counted externally the number of larvae in the vocal sacs. For each call recordings, they measured the call repetition rate (CRR, number of calls made in a 5 min period after the first call produced), the sound pressure level (SPL, dB), call duration (CD, seconds), the number of notes per call (NC), the note duration (ND, ms), the dominant frequency of the call (DF, Hz); and the amplitude of each vocalization (root mean square (RMS) amplitude) (Serrano et al., 2020).
The Serrano et al. (2020) used Simple Pearson correlations to explore the association used to explore association between acoustic variables of the calls (CRR, SPL, CD, NC, ND, DF) with physical characteristic variables (SVL, weight) (data not shown in article). They used Spearman correlations to determine the association between acoustic properties of the calls of pregnant males and the number of tadpoles in their vocal sacs (data not shown in article). The authors used an ANOVA to investigate the differences in acoustic variables depended on the relative body size of the sexes (Table 1), then they conducted a discriminant function analysis (DFA) for the note duration, dominant frequency, sound pressure level, and aggregated entropy of 4-notes calls (the most common call structure), then analyzed using two linear discriminant vectors (LD) (Figure 4) (Serrano et al., 2020).
Ultimately, Serrano et al. (2020) found females, who have larger body size (SVL and weight) than both male sexes, had longer note duration and call duration than any males (Table 1, Figure 3), and, even when data was corrected for SVL, no acoustic variable had a significant difference between the sexes, based on their ANOVA (Table 1), suggesting body size explains any variation among the acoustic variables and a monomorphic call structure. They also found a negative relationship between number of tadpoles in the vocal sacs of pregnant males and their call amplitude, indicating the advertising call of pregnant males is attenuated by an increased number of tadpoles AND could be an auditory indicator of tadpoles thus could be used for selecting a mate (data not shown; just very cool! This is something several other students and I studied with the first author of this paper in the field in 2019)! Ultimately, they propose mutual selective pressures of the different sexes might contribute to social recognition and sexual status recognition (i.e., recognition of reporoductive state, active or not) of individuals (Serrano et al., 2020).
I will be replicating the descriptive statistic analysis of the qqPlots, Simple Pearson Correlation, Simple Spearman correlation to explore normality trends in the data, just as the authors did, and ANOVA presented in Table 1 of Serrano et al. (2020). THen I will be replicating the Inferential statistical analysis of acoustic variables note duration, call duration, and dominant frequency (figure 3). Lastly, I will replicate the linear discrimination analysis of 5 acoustic variables (Dominant Frequency, Call Rate, Note Duration, Aggregated entropy, and Sound Pressure Level) (figure 4).
load in dataset:
f <- "https://raw.githubusercontent.com/slcornett/data-analysis-replication/main/data/Serrano_et_al_2020_MPMH.csv"
d <- read_csv(f, col_names = TRUE, show_col_types = FALSE) # show column names, hides dataframe message details
d <- d %>%
mutate(CRR = (Calls5min/5)) # creating calls per minute call repetition rate for ease in future calculations
# Data summary
s <- skim(d) # to make a summary table of skim(d) output
# character data
s %>% dplyr::filter(skim_type == "character") %>%
dplyr::select(skim_variable, n_missing, character.min, character.max, character.empty, character.n_unique) %>%
kable(align = 'l', booktabs = TRUE) %>%
kable_styling(font_size = 11, full_width = TRUE)
| skim_variable | n_missing | character.min | character.max | character.empty | character.n_unique |
|---|---|---|---|---|---|
| Nombre | 0 | 3 | 15 | 0 | 30 |
| Captura | 0 | 3 | 4 | 0 | 31 |
| Sex | 0 | 7 | 18 | 0 | 3 |
| Sexo | 0 | 1 | 2 | 0 | 3 |
| Track | 0 | 4 | 4 | 0 | 31 |
# numeric data modify the specific output
s %>% dplyr::filter(skim_type == "numeric") %>%
dplyr::select(skim_variable, n_missing, numeric.mean, numeric.sd, numeric.p0, numeric.p25, numeric.p50, numeric.p75, numeric.p100, numeric.hist) %>%
kable(align = 'l', booktabs = TRUE) %>% # left aligned data
kable_styling(font_size = 11, full_width = TRUE) # font-size
| skim_variable | n_missing | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|
| Calls5min | 0 | 7.290323e+00 | 2.734880e+00 | 3.000000 | 5.000000 | 8.000000e+00 | 9.000000e+00 | 1.200000e+01 | ▅▅▇▅▅ |
| Temp | 22 | 1.938889e+01 | 3.077923e+00 | 15.400000 | 18.200000 | 1.900000e+01 | 2.030000e+01 | 2.560000e+01 | ▅▇▅▂▂ |
| HR | 22 | 7.883333e+01 | 4.238514e+00 | 71.500000 | 75.100000 | 7.920000e+01 | 8.210000e+01 | 8.420000e+01 | ▂▃▂▂▇ |
| Larvas | 20 | 3.090909e+00 | 1.221028e+00 | 2.000000 | 2.000000 | 3.000000e+00 | 4.000000e+00 | 5.000000e+00 | ▇▃▁▃▃ |
| LHC | 1 | 2.242667e+01 | 9.351132e-01 | 20.600000 | 21.800000 | 2.235000e+01 | 2.307500e+01 | 2.520000e+01 | ▂▇▅▂▁ |
| Peso | 1 | 9.356667e-01 | 1.389042e-01 | 0.680000 | 0.837500 | 9.200000e-01 | 1.045000e+00 | 1.190000e+00 | ▇▆▇▆▇ |
| SPL | 6 | 6.336860e+01 | 4.117350e+00 | 53.418182 | 61.700000 | 6.322143e+01 | 6.631667e+01 | 7.143103e+01 | ▂▃▇▇▃ |
| NC | 0 | 1.474194e+01 | 7.763022e+00 | 4.000000 | 8.000000 | 1.400000e+01 | 1.800000e+01 | 3.300000e+01 | ▆▇▃▃▂ |
| ND | 0 | 1.684006e-01 | 3.099900e-02 | 0.115131 | 0.142706 | 1.624118e-01 | 1.890287e-01 | 2.565455e-01 | ▅▇▃▃▁ |
| CD | 3 | 1.670783e+00 | 2.164759e-01 | 1.311454 | 1.510821 | 1.636583e+00 | 1.838234e+00 | 2.141000e+00 | ▅▇▅▃▃ |
| Agg Entropy | 0 | 2.106072e+00 | 5.376220e-01 | 1.379379 | 1.666791 | 1.942529e+00 | 2.477863e+00 | 3.161019e+00 | ▇▆▃▂▃ |
| Avg Entropy | 0 | 1.970192e+00 | 3.396755e-01 | 1.524621 | 1.723484 | 1.851720e+00 | 2.123599e+00 | 2.721850e+00 | ▇▇▃▃▂ |
| DF | 0 | 3.597235e+03 | 2.901571e+02 | 2995.193548 | 3389.708648 | 3.617600e+03 | 3.793418e+03 | 4.272157e+03 | ▂▆▇▅▂ |
| RMS Amp | 0 | 1.005410e+05 | 7.949927e+04 | 12731.900000 | 53886.150000 | 6.422570e+04 | 1.228883e+05 | 3.560384e+05 | ▇▃▁▁▁ |
| CRR | 0 | 1.458065e+00 | 5.469760e-01 | 0.600000 | 1.000000 | 1.600000e+00 | 1.800000e+00 | 2.400000e+00 | ▅▅▇▅▅ |
detach(package:skimr)
DESCRIPTIONS/IDENTIFIERS:
• Nombre = name given to ID the frog;
• Captura = number captured/image name;
• Track = the call recording track
PHYSICAL MEASUREMENTS:
• sexo = sexual status (MP = pregnant male, H = female, M = non-pregnant male)
• peso = mass (g);
• LHC = longitud hocico a cola (snout-vent length, mm);
• temp = temperature (ºC);
• HR = relative humidity;
• larvas = number of tadpoles/offspring in their parent’s mouth!
ACOUSTIC VARIABLES:
• Calls5min = calls in 5 min interval;
• SPL = Sound Pressure Level (dB);
• NC = number of notes/call;
• ND = note duration (either s or ms);
• CD = call duration (s);
• Agg Entropy = aggregate entropy;
• Avg Entropy = average entropy;
• DF = dominant frequency of call (Hz);
• RMS Amp = Root Mean Square Amplitude
• CRR = Call Repetition Rate (Calls/min, i.e., Calls5min/5)
require(cowplot) # for grid plotting
# not included in article as a graph but i think it's nice to show for reference to the methods
p1 <- ggplot(data = d, # object created
aes(x = Sex, # sex of individuals in the dataset
y = LHC)) + # y = SVL, one value dropped b/c NA
geom_boxplot(na.rm = TRUE, show.legend = FALSE, aes(color = Sex)) +
labs(x = "Sex of Frog at time of Capture", y = "Snout-Vent Length (mm)") +
ggtitle("Body Size of R. Darwinii Frogs Incluced in this Study") +
theme_classic() + # don't need the legend because x is categorical
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# CRR by Sex
p2 <- ggplot(data = d, aes(x = Sex, y = CRR, color = Sex)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
labs(x = "Sex", y = "Calls/min") +
ggtitle("Call Repetition Rate by Sex") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# Notes per Call (NC) vs Sex
p3 <- ggplot(data = d, aes(x = Sex, y = NC, color = Sex)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # don't need the legend because x is categorical
labs(x = "Sex", y = "Notes per Call") +
ggtitle("Notes per Call by Sex") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green",face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# Call Repetition Rate (CRR) histogram
p4 <- ggplot(data = d, aes(x = CRR, color = Sex)) +
geom_histogram(na.rm = TRUE, show.legend = FALSE) + facet_wrap( ~ Sex) +
labs(x = "Calls per Minute", y = "Frequency of Call Rate") +
ggtitle("Histogram of Call Repetition Rate") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# pregnant males : number of tadpoles vs call amplitude
p5 <- ggplot(data = d, aes(x = as.factor(Larvas), y = `RMS Amp`, color = Larvas)) +
geom_boxplot(na.rm = TRUE, show.legend = FALSE) + # filtering out all but pregnant males because other sexes don't have tadpoles.
xlab("Number of Tadpoles") +
ylab("Call Amplitude (dB)") +
ggtitle("Pregnant Males: Number of Tadpoles vs Call Amplitude") +
theme_classic() +
theme(plot.title = element_text(size = 10, color = "dark green", face = "bold"),
axis.title.x = element_text(size = 8, color = "dark green"),
axis.title.y = element_text(size = 8, color = "dark green"))
# plotting the graphs together
plot_grid(p1, p2, p3, p4, p5, label_size = 8, nrow = 3)
Be sure to thoroughly explain what replications you are doing and comment your code so that it is easy for a reader to understand.
Include in this section relevant tables/figures/values from the original paper for comparison to what you accomplished with your replication.
Note that I want you to do the bulk of any exposition using text and markdown syntax outside of code blocks. That is, your document should not just be one big code block with R style comments but rather a nicely formatted report with code separated from exposition, interpretation, and discussion.]
# making a dataframe of the relevant variables so don't change the original dataset.
df<- d %>% dplyr::select(Nombre, Sex, Larvas, LHC, Peso, Calls5min, CRR, NC, ND, CD, DF, SPL, `RMS Amp`, `Agg Entropy`, `Avg Entropy`)
# renaming Sex variables in dataframe so will print nicer in the Tukey post-hoc. yes I could just use Sexo but I don't have to change all the code below if i do it like this. I didn't realize the labels were going to be a problem until after I wrote all the code for replication. Also now my visualization labels will match those used in the paper.
df <- df %>%
mutate(Sex = case_when(Sex == "Females" ~ "F",
Sex == "Non-pregnant males" ~ "NPM",
Sex == "Pregnant males" ~ "PM"))
print(df) %>%
kable(align = 'l', booktabs = TRUE) %>% # left aligned data
kable_styling(font_size = 10, full_width = TRUE) # font-size# checking work
## # A tibble: 31 × 15
## Nombre Sex Larvas LHC Peso Calls5min CRR NC ND CD DF SPL
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "Euge… F NA 23.4 1.15 8 1.6 15 0.257 2.14 3341. 71.4
## 2 "Mich… NPM NA 22.4 NA 4 0.8 17 0.156 1.56 3402. 63.2
## 3 "Dr\x… F NA 23.7 1.1 5 1 12 0.196 2.01 3686. NA
## 4 "Herl… NPM NA 22.7 0.92 5 1 33 0.182 1.72 3488. 62.8
## 5 "Luis… F NA 23.2 1.06 11 2.2 14 0.157 NA 3682. 64.2
## 6 "Fema… PM 5 22.6 1.11 3 0.6 14 0.176 1.74 3634. 62.4
## 7 "Obam… PM 2 22.4 0.93 9 1.8 22 0.144 1.54 3230. 68.3
## 8 "Edga… PM 3 22.2 0.92 8 1.6 17 0.133 1.42 4272. NA
## 9 "Olli… PM 2 22.2 0.86 8 1.6 14 0.209 1.92 3818. NA
## 10 "Rube… PM 2 21.6 0.86 3 0.6 4 0.162 1.64 3273 67.6
## # … with 21 more rows, and 3 more variables: `RMS Amp` <dbl>,
## # `Agg Entropy` <dbl>, `Avg Entropy` <dbl>
| Nombre | Sex | Larvas | LHC | Peso | Calls5min | CRR | NC | ND | CD | DF | SPL | RMS Amp | Agg Entropy | Avg Entropy |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Eugenia | F | NA | 23.4 | 1.15 | 8 | 1.6 | 15 | 0.2565455 | 2.141000 | 3340.876 | 71.43103 | 162764.3 | 1.379379 | 1.524621 |
| Michelle | NPM | NA | 22.4 | NA | 4 | 0.8 | 17 | 0.1556250 | 1.557000 | 3402.231 | 63.22143 | 89155.9 | 2.427146 | 2.031938 |
| Dr<e1>cula | F | NA | 23.7 | 1.10 | 5 | 1.0 | 12 | 0.1964600 | 2.008200 | 3686.474 | NA | 53409.7 | 1.959700 | 1.851720 |
| Herlinda | NPM | NA | 22.7 | 0.92 | 5 | 1.0 | 33 | 0.1815974 | 1.723222 | 3487.814 | 62.77895 | 110800.1 | 2.882636 | 2.690169 |
| Luisa | F | NA | 23.2 | 1.06 | 11 | 2.2 | 14 | 0.1571250 | NA | 3682.167 | 64.16923 | 43954.0 | 1.926542 | 1.766208 |
| Femalesrancisco | PM | 5 | 22.6 | 1.11 | 3 | 0.6 | 14 | 0.1764444 | 1.737125 | 3633.539 | 62.37500 | 62616.0 | 2.544982 | 2.458500 |
| Obama | PM | 2 | 22.4 | 0.93 | 9 | 1.8 | 22 | 0.1435536 | 1.542750 | 3229.946 | 68.26000 | 219744.9 | 1.852411 | 1.757268 |
| Edgar | PM | 3 | 22.2 | 0.92 | 8 | 1.6 | 17 | 0.1327429 | 1.419000 | 4272.157 | NA | 58691.5 | 1.734871 | 1.767586 |
| Ollin | PM | 2 | 22.2 | 0.86 | 8 | 1.6 | 14 | 0.2088393 | 1.920333 | 3817.511 | NA | 105319.3 | 3.143518 | 2.380518 |
| Ruben | PM | 2 | 21.6 | 0.86 | 3 | 0.6 | 4 | 0.1624118 | 1.643667 | 3273.000 | 67.56667 | 29010.9 | 1.593118 | 1.594235 |
| Tehuacan | NPM | NA | 21.8 | 0.83 | 11 | 2.2 | 8 | 0.2013824 | 1.629500 | 3951.965 | 58.16000 | 40606.4 | 1.942529 | 1.919500 |
| Esteban | NPM | NA | 20.6 | 0.78 | 9 | 1.8 | 10 | 0.1418584 | 1.311454 | 3840.141 | 67.59000 | 356038.4 | 2.610080 | 2.721850 |
| Femaleslash | NPM | NA | 21.8 | 0.76 | 11 | 2.2 | 19 | 0.1336744 | 1.627500 | 3377.186 | NA | 64225.7 | 1.739302 | 1.684233 |
| 267 | PM | 5 | 23.4 | 1.09 | 12 | 2.4 | 15 | 0.1678679 | 1.543800 | 4059.596 | 56.92857 | 73951.4 | 3.161019 | 2.587359 |
| Cirilo | PM | 3 | 22.6 | 0.94 | 4 | 0.8 | 5 | 0.1598000 | 1.688200 | 3445.300 | NA | 54693.2 | 1.606550 | 1.788800 |
| Abel | PM | 2 | 21.7 | 1.00 | 7 | 1.4 | 17 | 0.1519701 | 1.549688 | 3617.600 | 69.14000 | 140005.6 | 1.561761 | 1.645627 |
| Izcoatl | PM | 2 | 22.3 | 0.90 | 4 | 0.8 | 7 | 0.1414857 | NA | 3652.040 | 64.54286 | 106579.7 | 1.791514 | 1.915029 |
| Ismael | NPM | NA | NA | 0.74 | 8 | 1.6 | 11 | 0.1312581 | 1.386143 | 3923.200 | NA | 49986.7 | 1.601355 | 1.717290 |
| Victoria | F | NA | 23.6 | 1.11 | 8 | 1.6 | 8 | 0.1626774 | 1.518571 | 2995.194 | 58.87500 | 12731.9 | 2.528581 | 2.046064 |
| Victor | NPM | NA | 23.2 | 0.86 | 5 | 1.0 | 13 | 0.1403438 | 1.403000 | 3962.100 | 62.73750 | 55335.5 | 1.552844 | 1.688219 |
| Melchor | NPM | NA | 21.6 | 0.75 | 5 | 1.0 | 27 | 0.1409467 | 1.487571 | 3929.941 | 64.51000 | 134976.5 | 1.685573 | 1.697280 |
| Femalesrancisco | NPM | NA | 22.7 | 0.80 | 6 | 1.2 | 8 | 0.1151310 | NA | 3769.326 | 61.84667 | 189382.9 | 2.258976 | 1.729679 |
| Xavier | NPM | NA | 21.3 | 0.68 | 10 | 2.0 | 19 | 0.1580547 | 1.478333 | 3649.885 | 59.92400 | 311815.6 | 1.474633 | 1.605016 |
| Ernesto | NPM | NA | 21.8 | 0.78 | 6 | 1.2 | 15 | 0.1791346 | 1.850800 | 3538.088 | 53.41818 | 54669.0 | 1.980154 | 2.021135 |
| 350 | F | NA | 21.9 | 0.90 | 7 | 1.4 | 8 | 0.1615600 | 2.005500 | 3341.940 | 64.53750 | 51199.9 | 2.132840 | 1.987320 |
| 352 | F | NA | 23.2 | 1.13 | 12 | 2.4 | 12 | 0.2103800 | 1.862300 | 3593.478 | 61.70000 | 86866.9 | 2.416900 | 2.276980 |
| Ele | F | NA | 22.1 | 0.92 | 10 | 2.0 | 33 | 0.2107479 | 1.707458 | 3254.204 | 66.65769 | 72817.7 | 2.219202 | 2.009328 |
| Lucrecia | F | NA | 22.7 | 1.00 | 9 | 1.8 | 22 | 0.1979552 | 1.981750 | 3113.609 | 65.52143 | 42238.9 | 3.146298 | 2.457508 |
| Socorro | F | NA | 25.2 | 1.19 | 7 | 1.4 | 7 | 0.1683000 | 1.761000 | 3508.483 | 62.15000 | 54362.6 | 2.919900 | 2.201133 |
| ChiFemales<fa> | PM | 4 | 20.9 | 1.00 | 8 | 1.6 | 27 | 0.2086509 | 1.834045 | 3547.705 | 59.85652 | 166881.2 | 1.648009 | 1.775726 |
| Silvio | PM | 4 | 22.0 | 1.00 | 3 | 0.6 | 4 | 0.1658947 | 1.463000 | 3617.600 | 66.31667 | 61938.0 | 1.865895 | 1.778105 |
# dividing df by sex so easier to access variables. Commented out the ones i don't actually need in the replication.
# np.males <- df %>% dplyr::filter(Sex == "Non-pregnant males") #Non-pregnant males (N = 11)
# head(np.males)
p.males <- df %>% dplyr::filter(Sex == "PM") # Pregnant males (N = 11)
head(p.males)
## # A tibble: 6 × 15
## Nombre Sex Larvas LHC Peso Calls5min CRR NC ND CD DF SPL
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Female… PM 5 22.6 1.11 3 0.6 14 0.176 1.74 3634. 62.4
## 2 Obama PM 2 22.4 0.93 9 1.8 22 0.144 1.54 3230. 68.3
## 3 Edgar PM 3 22.2 0.92 8 1.6 17 0.133 1.42 4272. NA
## 4 Ollin PM 2 22.2 0.86 8 1.6 14 0.209 1.92 3818. NA
## 5 Ruben PM 2 21.6 0.86 3 0.6 4 0.162 1.64 3273 67.6
## 6 267 PM 5 23.4 1.09 12 2.4 15 0.168 1.54 4060. 56.9
## # … with 3 more variables: `RMS Amp` <dbl>, `Agg Entropy` <dbl>,
## # `Avg Entropy` <dbl>
# females <- df %>% dplyr::filter(Sex == "Females") # Females (N = 9, according to the article, N=10, but the dataset only has 9.)
# head(females)
[descriptive statistical analysis]
The authors reported performing the following statistical analyses before performing their ANOVA. As mentioned before, they used Simple Pearson Correlations to determine if the different acoustic variables correlate with either Snout-Vent Length (LHC, mm) or mass (g) (Serrano et al., 2020). They also performed Spearman Correlations to determine an association between acoustic variables and the number of tadpoles in the vocal sacs of pregnant males (Serrano et al., 2020). Below, I repeat these tests.
# SHOULD I TAKE THESE OUT? THEY AREN'T VERY INFORMATIVE.
# Pearson correlations between snout-vent length and acoustic variables
# Snout Vent Length (LHC, mm) vs Note Duration (s)
stats::cor.test(df$LHC, df$ND, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$ND
## t = 0.9008, df = 28, p-value = 0.3754
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2048321 0.4979823
## sample estimates:
## cor
## 0.1678215
# Snout Vent Length (LHC, mm) vs Call Duration (s)
stats::cor.test(df$LHC, df$CD, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$CD
## t = 1.5882, df = 25, p-value = 0.1248
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08731632 0.61231257
## sample estimates:
## cor
## 0.3027431
# Snout Vent Length (LHC, mm) vs Dominant Frequency (Hz)
stats::cor.test(df$LHC, df$DF, method = "pearson")
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$DF
## t = -0.60019, df = 28, p-value = 0.5532
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.4545167 0.2580443
## sample estimates:
## cor
## -0.1127023
# Snout Vent Length (LHC, mm) vs Aggregate Entropy
stats::cor.test(df$LHC, df$`Agg Entropy`, method = "pearson") # near correlation (p = 0.058)
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`Agg Entropy`
## t = 1.9699, df = 28, p-value = 0.05881
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.01301586 0.62997452
## sample estimates:
## cor
## 0.3488894
stats::cor.test(log(df$LHC), log(df$`Agg Entropy`), method = "pearson") # log transformation does not help.
##
## Pearson's product-moment correlation
##
## data: log(df$LHC) and log(df$`Agg Entropy`)
## t = 1.875, df = 28, p-value = 0.07127
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.02987812 0.61969111
## sample estimates:
## cor
## 0.3339862
# Snout Vent Length (LHC, mm) vs Avg Entropy
stats::cor.test(df$LHC, df$`Avg Entropy`, method = "pearson") #no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`Avg Entropy`
## t = 0.71073, df = 28, p-value = 0.4831
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2385911 0.4708103
## sample estimates:
## cor
## 0.1331208
# Snout Vent Length (LHC, mm) vs Root Mean Squares of call Amplitude
stats::cor.test(df$LHC, df$`RMS Amp`, method = "pearson") # CORRELATED! (p = 0.01001)
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$`RMS Amp`
## t = -2.7628, df = 28, p-value = 0.01001
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.7054719 -0.1230934
## sample estimates:
## cor
## -0.4628373
# Snout Vent Length (LHC, mm) vs Call Repetition Rate (calls/min)
stats::cor.test(df$LHC, df$CRR, method = "pearson") # super no correlation
##
## Pearson's product-moment correlation
##
## data: df$LHC and df$CRR
## t = 0.076324, df = 28, p-value = 0.9397
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3476532 0.3727548
## sample estimates:
## cor
## 0.01442243
# Pearson correlations between weight and acoustic variables Weight (g) vs
# Note Duration (s)
stats::cor.test(df$Peso, df$ND, method = "pearson") # CORRELATED! (p = 0.006363)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$ND
## t = 2.9494, df = 28, p-value = 0.006363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1535281 0.7207203
## sample estimates:
## cor
## 0.486868
# Weight (g) vs Call Duration (s)
stats::cor.test(df$Peso, df$CD, method = "pearson") # CORRELATED! (p = 0.0185)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$CD
## t = 2.5199, df = 25, p-value = 0.0185
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.08448738 0.70883635
## sample estimates:
## cor
## 0.4500518
# Weight (g) vs Dominant Frequency (Hz)
stats::cor.test(df$Peso, df$DF, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$DF
## t = -1.4735, df = 28, p-value = 0.1518
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5731400 0.1018496
## sample estimates:
## cor
## -0.268263
# Weight (g) vs Aggregate Entropy of call
stats::cor.test(df$Peso, df$`Agg Entropy`, method = "pearson") # not correlated
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`Agg Entropy`
## t = 1.812, df = 28, p-value = 0.08073
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.04110161 0.61272059
## sample estimates:
## cor
## 0.3239648
stats::cor.test(log(df$Peso), log(df$`Agg Entropy`), method = "pearson") # log transformation did not help
##
## Pearson's product-moment correlation
##
## data: log(df$Peso) and log(df$`Agg Entropy`)
## t = 1.8005, df = 28, p-value = 0.08257
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.0431561 0.6114335
## sample estimates:
## cor
## 0.3221214
# Weight (g) vs Average Entropy of call
stats::cor.test(df$Peso, df$`Avg Entropy`, method = "pearson") # no correlation
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`Avg Entropy`
## t = 1.275, df = 28, p-value = 0.2128
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.1376277 0.5482556
## sample estimates:
## cor
## 0.2342567
# Weight (g) vs Root Mean Squares of call Amplitude
stats::cor.test(df$Peso, df$`RMS Amp`, method = "pearson") # no correlation, but very close (p = 0.05853)
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$`RMS Amp`
## t = -1.9723, df = 28, p-value = 0.05853
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.63022222 0.01260513
## sample estimates:
## cor
## -0.3492501
# Weight (g) vs Call Repetition Rate (calls/min)
stats::cor.test(df$Peso, df$CRR, method = "pearson") # no correlation at all
##
## Pearson's product-moment correlation
##
## data: df$Peso and df$CRR
## t = 0.11242, df = 28, p-value = 0.9113
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3416424 0.3786131
## sample estimates:
## cor
## 0.02124125
# Spearman correlations: number of tadpoles in pregnant males' vs acoustic
# variables Number tadpoles vs note duration (s)
stats::cor(p.males$Larvas, p.males$ND, method = "spearman") # no correlation
## [1] 0.4227058
# Number tadpoles vs call duration (s)
stats::cor(p.males$Larvas, p.males$CD, method = "spearman") # no correlation
## [1] NA
# Number tadpoles vs dominant frequency (Hz)
stats::cor(p.males$Larvas, p.males$DF, method = "spearman") # no correlation
## [1] 0.2960874
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`Agg Entropy`, method = "spearman") # no correlation
## [1] 0.4419197
# Number tadpoles vs Average Entropy of call
stats::cor(p.males$Larvas, p.males$`Avg Entropy`, method = "spearman") # no correlation
## [1] 0.5620066
# Number tadpoles vs Aggregate Entropy of call
stats::cor(p.males$Larvas, p.males$`RMS Amp`, method = "spearman") # no correlation
## [1] -0.201746
# Number tadpoles vs Call Repetition Rate (calls/min)
stats::cor(p.males$Larvas, p.males$CRR, method = "spearman") # no correlation
## [1] -0.01716697
Serrano et al. (2020) first verified the normality criteria for all the variables in their data using quantile-quantile plots, likely to determine if any of their variables needed to be transformed before analyzing them with ANOVAs. Though they do not show these plots in their paper, I am replicating them below because the authors do not report which variables were transformed nor do they report how they were transformed, other than stating they used the Box Cox transformation.
require(ggpubr) # for qqplots
require(cowplot) # for wrapping the plots
# QQ Plots, will show warnings because there are NAs in the data. this is
# fine.
qp1 <- ggqqplot(df$LHC, title = "Snout-Vent Length (mm) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp2 <- ggqqplot(df$Peso, title = "Weight (g) QQ Plot") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp3 <- ggqqplot(df$Calls5min, title = "Call Rate Interval (Calls/5min) QQ Plot") +
font("title", size = 10, color = "dark green", face = "bold") # normal and looks exactly the same as CRR variable!
qp4 <- ggqqplot(df$CRR, title = "Call Repetition Rate (Calls/min) QQ Plot") +
font("title", size = 10, color = "dark green", face = "bold") # derived this from Calls5min, normal
qp5 <- ggqqplot(df$NC, title = "Notes per Call QQ Polt") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp6 <- ggqqplot(df$ND, title = "Note Duration (s) QQPlot") + font("title", size = 10,
color = "dark green", face = "bold") # normal
qp7 <- ggqqplot(df$CD, title = "Call Duration (s) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp8 <- ggqqplot(df$DF, title = "Dominant Frequency (Hz) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp9 <- ggqqplot(df$SPL, title = "Sound Pressure Level (dB) QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # normal
qp10 <- ggqqplot(df$`Agg Entropy`, title = "Aggregate Entropy QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # not normal
qp11 <- ggqqplot(df$`Avg Entropy`, title = "Average Entropy QQ Plot") + font("title",
size = 10, color = "dark green", face = "bold") # not normal
# plotting qq plots all together
plot_grid(qp1, qp2, qp3, qp4, label_size = 8, nrow = 2)
plot_grid(qp5, qp6, qp7, qp8, label_size = 8, nrow = 2)
plot_grid(qp9, qp10, qp11, label_size = 8, nrow = 1)
# detaching unneeded packages
detach(package:ggpubr)
detach(package:cowplot)
Based on these plots, the two variables that appear to not fit the normality criteria for ANOVA are the two acoustic variables of entropy (Average Entropy and Aggregated Entropy). As the authors describe in their Statistical methods, I will transform these data useing a boxcox() from the {MASS} package to determine how to best transform these variables for ANOVA.
Here, I run Box-Cox transformation for linear models/ANOVAS to determine how to transform variables not fitting normality criterion based on qqplot. The boxcox() function computes and plots the log-likelihood for an object to show the exponent power (\(\lambda\)) the response variable in the object should be raised to transform the object to fit normality criteria for comparison in ANOVA.
• The object argument is the response variables (Avg Entropy and Agg Entropy) against the predictor variables (Sex or Sex+Snout-Vent Length (LHC)) that will be compared in an ANOVA (or in a linear model, but not doing that here). • The lambda argument is the vector values to check as potential exponential powers to raise the variable to in a transformation. By default, the function uses (-2, 2) with 0.1 interval steps. Functionally, this is the x variable in the plot output.
• plotit = is a logical argument, and when it is set to TRUE, it will output a plot of the 95% confidence log-likelihood curve. Setting it to FALSE prints all the the x vakues and y values separately.
• interp = instead of fitting a single, high-degree polynomial to all of the values at once. spline interpolation fits low-degree polynomials to small subsets of the values
require(MASS) # reported used for normalizing data for ANOVA
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
# boxcox of agg entropy vs sex.
boxcox(df$`Agg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object"
interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AggE_t1 = (`Agg Entropy`)^(-9/10)) # adding column of transformed AggEntropy
# boxcox of agg entropy vs Sex corrected for snout-vent length
#boxcox(df$`Agg Entropy` ~ df$Sex + df$LHC, lambda = seq(-2, 2, 1/10), plotit = TRUE, # object
# interp = TRUE, eps = 1/50, xlab = expression(lambda), #"interp"
# ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex + SVL
#df <- df %>% mutate(AggE_t2 = (`Agg Entropy`)^(-7/10)) # adding column of transformed AggEntropy
# boxcox of Avg entropy vs Sex
boxcox(df$`Avg Entropy` ~ df$Sex, lambda = seq(-2, 2, 1/10), plotit = TRUE, # "object"
interp = TRUE, eps = 1/50, xlab = expression(lambda), #interp TRUE gives slightly lower lambda (like 0.9 vs 0.8 with FALSE)
ylab = "log-Likelihood")
# transformation for Aggregated Entropy Vs Sex
df <- df %>% mutate(AvgE_t1 = (`Agg Entropy`)^(-1/10)) # adding column of transformed AvgEntropy
In the methods, the authors reported using {car} package for Anova function (though they do not report the type). Below, for each acoustic variable included in Table 1, I fist make an ANOVA model (using the aov() function) of the acoustic variable to see if it varies between the three sexes (female (F), pregnant male (MP), non-pregnant male (NMP): aov(data = df, [acoustic variable] ~ Sex) I then print the summary of the original ANOVA model to compar the F statistic and level of significance to that reported in table 1.
#sex as predictor, NOT correcting by SVL = Sex+SVL as this gives a different F-value than what the authors report in table 1.
# Snout-Vent Length (mm) between different Sexes.
SVL_sex <- aov(data = df, LHC ~ Sex) # SVL = LHC
summary.aov(SVL_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 8.312 4.156 6.583 0.00469 **
## Residuals 27 17.046 0.631
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
SVL_sex.F <- SVL_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
SVL_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 6.582981
# Mass (g) between different Sexes.
mass_sex <- aov(data = df, Peso ~ Sex) # Peso = Mass/weight
summary.aov(mass_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.3655 0.18275 25.43 6.17e-07 ***
## Residuals 27 0.1940 0.00719
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness
# pulling out the F-statistic (nonparametric)
mass_sex.F <- mass_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
mass_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1!
## [1] 25.43115
# Note Duration (ND, s) between different Sexes. the article reports this is in ms in Table 1, but, based on the data set, they are in seconds.
ND_sex <- aov(data = df, ND ~ Sex) # model
summary.aov(ND_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.007553 0.003776 4.97 0.0142 *
## Residuals 28 0.021276 0.000760
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
ND_sex.F <- ND_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
ND_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 4.969906
# Call Duration (CD, s) between different Sexes.
CD_sex <- aov(data = df, CD ~ Sex) # model
summary.aov(CD_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.4983 0.24917 8.122 0.00191 **
## Residuals 25 0.7669 0.03068
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 3 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
CD_sex.F <- CD_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
CD_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 8.122413
# Dominant Frequency (DF, Hz) between different Sexes.
DF_sex <- aov(data = df, DF ~ Sex) # model
summary.aov(DF_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 561049 280524 3.998 0.0297 *
## Residuals 28 1964686 70167
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# pulling out the F-statistic (nonparametric)
DF_sex.F <- DF_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
DF_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of SVL alone (full table will also show the p-value). matches the F-value and significance level reported in Table 1.
## [1] 3.997933
# Aggregated Entropy between different Sexes.
AggE_sex <- aov(data = df, `Agg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AggE_sex) # print the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.445 0.2223 0.757 0.479
## Residuals 28 8.226 0.2938
# pulling out the F-statistic (nonparametric)
AggE_sex.F <- AggE_sex %>%
generics::tidy() %>%
filter(term == "Sex") # since comparing between sexes
AggE_sex.F$statistic # prints the omnibus F-statistic of the between Sex comparison of Agg Entropy alone (full table will also show the p-value)
## [1] 0.7566687
# to show type 2 and 3 print the same F-value
# type 2 ANOVA
AggE_sex.a2 <- Anova(AggE_sex, type = "II")
# print the results
AggE_sex.a2 # the F value here IS EXACTLY WHAT THEY GOT IN TABLE 1????? so apparently, despite this variable not fulfilling normality criteria for linear models. This is very confusing to me. # type 3 ANOVA
## Anova Table (Type II tests)
##
## Response: Agg Entropy
## Sum Sq Df F value Pr(>F)
## Sex 0.4446 2 0.7567 0.4786
## Residuals 8.2265 28
AggE_sex.a3 <- Anova(AggE_sex, type = "III")
# print the results
AggE_sex.a3
## Anova Table (Type III tests)
##
## Response: Agg Entropy
## Sum Sq Df F value Pr(>F)
## (Intercept) 47.286 1 160.9427 3.967e-13 ***
## Sex 0.445 2 0.7567 0.4786
## Residuals 8.226 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Agg Entropy vs Sex for three types of ANOVA, meaning the variables are balanced and factors are orthogonal, so can trust the default type 1.
# AggE vs Sex transformed (t1) vs sex; did this because the authors report doing boxcox-informed transformations, but gives a different F-value than what they reported in Table 1, so have commented them out.
# AggE_sex <- aov(data = df, AggE_t1 ~ Sex)
# AggE_sex.a <- Anova(AggE_sex, type = c(2))
# AggE_sex.a
# Average Entropy between different Sexes. Because of the above, I did not use the boxcox transformed Avg Entropy, and it works out that the F-value calculated below is the exact same as that reported in Table 1.
AvgE_sex <- aov(data = df, `Avg Entropy` ~ Sex) # untransformed, also pull out F-value
summary.aov(AvgE_sex) #printing the summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 0.024 0.01193 0.097 0.908
## Residuals 28 3.438 0.12277
# to show type 2 and 3 print the same F-value
# type 2 ANOVA
AvgE_sex.a2 <- Anova(AvgE_sex, type = "II")
# print the results
AvgE_sex.a2 # same as the original and same as what the authors reported.
## Anova Table (Type II tests)
##
## Response: Avg Entropy
## Sum Sq Df F value Pr(>F)
## Sex 0.0239 2 0.0972 0.9077
## Residuals 3.4375 28
AvgE_sex.a3 <- Anova(AvgE_sex, type = "III")
# print the results
AvgE_sex.a3 # same as the original and same as what the authors reported.
## Anova Table (Type III tests)
##
## Response: Avg Entropy
## Sum Sq Df F value Pr(>F)
## (Intercept) 36.485 1 297.1863 <2e-16 ***
## Sex 0.024 2 0.0972 0.9077
## Residuals 3.438 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# All F-values and Sums of squares of the untransformed Avg Entropy vs Sex for three types of ANOVA, meaning the variables are balanced and factors are orthogonal.
# pulling out the F-statistic (nonparametric) from the type 1 ANOVA model
AvgE_sex.F <- AvgE_sex %>%
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
AvgE_sex.F$statistic #AvgE F-value. matches what's reported in Table 1
## [1] 0.09718057
# Call Rate (calls/min) between the Sexes.
CRR_sex <- aov(data = df, CRR ~ Sex)
summary.aov(CRR_sex) #printing the model summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 1.032 0.5160 1.819 0.181
## Residuals 28 7.943 0.2837
# pulling out the F-statistic (nonparametric)
CRR_sex.F <- CRR_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
CRR_sex.F$statistic #Call Rate F-value, matches the F-value reported in Table 1
## [1] 1.818948
# Sound Pressure Level (SPL) between the Sexes.
SPL_sex <- aov(data = df, SPL ~ Sex)
summary.aov(SPL_sex) #printing the summary
## Df Sum Sq Mean Sq F value Pr(>F)
## Sex 2 45.2 22.59 1.374 0.274
## Residuals 22 361.7 16.44
## 6 observations deleted due to missingness
# pulling out the F-statistic (nonparametric)
SPL_sex.F <- SPL_sex %>% # pull out the F-value
generics::tidy() %>% # a nice table
filter(term == "Sex") # since comparing between sexes
SPL_sex.F$statistic # SPL F-value
## [1] 1.373854
# making a table of the calculated mean and standard deviation for each acoustic variable reported in Table 1
bySex <- group_by(df, Sex)
Acoustics <- bySex %>%
summarise(n_cases = n(), # sample size in group
mean_SVL = mean(LHC, na.rm = TRUE), # mean SVL
sd_SVL = sd(LHC, na.rm = TRUE), # standard deviation SVL
mean_mass = mean(Peso, na.rm = TRUE), # mean mass (g)
sd_mass = sd(Peso, na.rm = TRUE), # sd mass
mean_ND = mean(ND, na.rm = TRUE), # mean note duration (s)
sd_ND = sd(ND, na.rm = TRUE),
mean_CD = mean(CD, na.rm = TRUE), # mean call duration (s)
sd_CD = sd(CD, na.rm = TRUE),
mean_DF = mean(DF, na.rm = TRUE), # mean dominant freq (Hz)
sd_DF = sd(DF, na.rm = TRUE),
mean_AggE = mean(`Agg Entropy`, na.rm = TRUE), # mean Aggregated Entropy
sd_AggE = sd(`Agg Entropy`, na.rm = TRUE),
mean_AvgE = mean(`Avg Entropy`, na.rm = TRUE), # mean Average Entropy
sd_AvgE = sd(`Avg Entropy`, na.rm = TRUE),
mean_CRR = mean(CRR, na.rm = TRUE), # mean Call/min
sd_CRR = sd(CRR, na.rm = TRUE),
mean_SPL = mean(SPL, na.rm = TRUE), # mean sound pressure level (dB)
sd_SPL = sd(SPL, na.rm = TRUE)
)
Acoustics <- Acoustics %>% # changing labels back to their original state for the table here.
mutate(Sex = case_when(Sex == "F" ~ "Females",
Sex == "NPM" ~ "Non-pregnant males",
Sex == "PM" ~ "Pregnant males"))
# transpose(l, fill=NA, ignore.empty=FALSE, keep.names=NULL, make.names=NULL)
Acoustics_t <- transpose(Acoustics, keep.names = "Acoustic Variable") #transpose data frame
Acoustics_t
## Acoustic Variable V1 V2 V3
## 1 Sex Females Non-pregnant males Pregnant males
## 2 n_cases 9 11 11
## 3 mean_SVL 23.2222222222222 21.99 22.1727272727273
## 4 sd_SVL 0.974394398816231 0.768042244208538 0.643569590783948
## 5 mean_mass 1.06222222222222 0.79 0.964545454545455
## 6 sd_mass 0.101707642015949 0.0673300329224139 0.0839480358750146
## 7 mean_ND 0.191305666333333 0.152636938090909 0.165423762818182
## 8 sd_ND 0.0325399835957568 0.0256564327093529 0.0249443852582658
## 9 mean_CD 1.87322247025 1.5454524386 1.6341607955
## 10 sd_CD 0.201837134488656 0.165680686897357 0.161488820849156
## 11 mean_DF 3390.71382922222 3711.98892163636 3651.45401072727
## 12 sd_DF 245.998679493103 228.989746357455 309.224874457775
## 13 mean_AggE 2.292149032 2.01411166836364 2.04578614972727
## 14 sd_AggE 0.536556919319484 0.469465368911618 0.609866842967914
## 15 mean_AvgE 2.01343139866667 1.95511875736364 1.94988657527273
## 16 sd_AvgE 0.280593615650822 0.397587255429383 0.350271792331423
## 17 mean_CRR 1.71111111111111 1.45454545454545 1.25454545454545
## 18 sd_CRR 0.4371625682868 0.522232967867094 0.607229176445988
## 19 mean_SPL 64.38023576625 61.5763027144444 64.37328545625
## 20 sd_SPL 3.7557077162866 4.05125858350565 4.33670962047346
#redefine row and column names
#rownames(Acoustics_t) <- colnames(Acoustics)
#colnames(Acoustics_t) <- rownames(Acoustics)
#display transposed data frame
#Acoustics_t
[Inferential statistical analysis]
Whiskers represent the 90th and 10th percentiles.
require(cowplot)
# A Boxplot of Note Duration
fig3_A <- ggplot(data = df, aes(x = Sex, y = ND)) + # made the object
geom_boxplot() + # boxplot
labs(x = "Sex", y = "Note Duration (s)", title = "A" ) + # axis labels and title with figure label
scale_y_continuous(breaks = c(0.15, 0.20, 0.25, 0.30), # so axis matches that in the article
position = 'left') +
theme_classic()
# B Boxplot of Call Duration
fig3_B <- ggplot(data = df, aes(x = Sex, y = CD)) +
geom_boxplot() +
labs(x = "Sex", y = "Call Duration (s)", title = "B") + # scaling didn't work here. RIP
theme_classic()
# C Boxplot of Dominant Frequency
fig3_C <- ggplot(data = df, aes(x = Sex, y = DF)) + #define object
geom_boxplot() + # boxplot
labs(x = "Sex", y = "Dominant Frequency (Hz)", title = "C") + # axis labels and title with figure label, couldn't get the y-axis label to match here either.
theme_classic()
# organize the output of figures using cowplot
plot_grid(fig3_A, fig3_B, fig3_C, label_size = 8, nrow = 1)
detach(package:cowplot)
• one visualization filter data for 4 notes per call (NC == 4, n = 22). drop_na() using MASS::lda() Linear Discriminant Analysis LD1 = f and npm, LD2 = f and pm
groups + x1 + x2
df <- df %>%
drop_na(DF, ND, CRR, `Agg Entropy`, SPL)
# lda0 <- MASS::lda(Sex ~ DF + ND + CRR + `Agg Entropy` + SPL, data = df)
# normalize : subtract mean, divide by SD,
Narrative section that overviews how successful were you at replicating the analyses and visualizations in the study.
1. What problems did you encounter? Some of the data described in the methods isn’t included in the data frame. 2. Why might you have encountered those problems?
3. What details were lacking from the original study’s methods that might have hampered your ability to replicate the authors’ results?
• Sandmeier, F. (2016). Rhinoderma darwinii [Encyclopedia]. AmphibiaWeb; University of California, Berkeley, CA, USA. https://amphibiaweb.org/species/4322
• Serrano, J. M., Penna, M., Valenzuela-Sánchez, A., Mendez, M. A., & Azat, C. (2020). Monomorphic call structure and dimorphic vocal phenology in a sex-role reversed frog. Behavioral Ecology and Sociobiology, 74(10), 127. https://doi.org/10.1007/s00265-020-02903-3